scrapy cannot find div on this website [on hold]

Posted by Jaspal Singh Rathour on Programmers See other posts from Programmers or by Jaspal Singh Rathour
Published on 2014-08-24T22:57:43Z Indexed on 2014/08/25 4:31 UTC
Read the original article Hit count: 496

Filed under:

web-scraping

I am very new at this and have been trying to get my head around my first selector can somebody help? i am trying to extract data from page http://groceries.asda.com/asda-webstore/landing/home.shtml?cmpid=ahc--ghs-d1--asdacom-dsk-_-hp#/shelf/1215337195041/1/so_false

all the info under div class = listing clearfix shelfListing but i cant seem to figure out how to format response.xpath().

I have managed to launch scrapy console but no matter what I type in response.xpath() i cant seem to select the right node. I know it works because when I type >>>response.xpath('//div[@class="container"]') I get a response but don't know how to navigate to the listsing cleardix shelflisting. I am hoping that once i get this bit I can continue working my way through the spider.

Thank you in advance!

PS I wonder if it is not possible to scan this site - is it possible for the owners to block spiders?

Scrapy cannot find div on this website [on hold]

Posted by Jaspal Singh Rathour on Pro Webmasters See other posts from Pro Webmasters or by Jaspal Singh Rathour
Published on 2014-08-24T22:59:46Z Indexed on 2014/08/25 4:33 UTC
Read the original article Hit count: 496

Filed under:

Xml

|

python

|

scraper-sites

I am very new at this and have been trying to get my head around my first selector can somebody help? i am trying to extract data from page http://groceries.asda.com/asda-webstore/landing/home.shtml?cmpid=ahc--ghs-d1--asdacom-dsk-_-hp#/shelf/1215337195041/1/so_false

all the info under div class = listing clearfix shelfListing but i cant seem to figure out how to format response.xpath().

I have managed to launch scrapy console but no matter what I type in response.xpath() i cant seem to select the right node. I know it works because when I type >>>response.xpath('//div[@class="container"]') I get a response but don't know how to navigate to the listsing cleardix shelflisting. I am hoping that once i get this bit I can continue working my way through the spider.

Thank you in advance!

PS I wonder if it is not possible to scan this site - is it possible for the owners to block spiders?

Developer IT